74 research outputs found

    Extracting user spatio-temporal profiles from location based social networks

    Get PDF
    Report de RecercaLocation Based Social Networks (LBSN) like Twitter or Instagram are a good source for user spatio-temporal behavior. These social network provide a low rate sampling of user's location information during large intervals of time that can be used to discover complex behaviors, including mobility profiles, points of interest or unusual events. This information is important for different domains like mobility route planning, touristic recommendation systems or city planning. Other approaches have used the data from LSBN to categorize areas of a city depending on the categories of the places that people visit or to discover user behavioral patterns from their visits. The aim of this paper is to analyze how the spatio-temporal behavior of a large number of users in a well limited geographical area can be segmented in different profiles. These behavioral profiles are obtained by means of clustering algorithms that show the different behaviors that people have when living and visiting a city. The data analyzed was obtained from the public data feeds of Twitter and Instagram inside the area of the city of Barcelona for a period of several months. The analysis of these data shows that these kind of algorithms can be successfully applied to data from any city (or any general area) to discover useful profiles that can be described on terms of the city singular places and areas and their temporal relationships. These profiles can be used as a basis for making decisions in different application domains, specially those related with mobility inside and outside a city.Preprin

    K-means vs Mini Batch K-means: a comparison

    Get PDF
    Mini Batch K-means (cite{Sculley2010}) has been proposed as an alternative to the K-means algorithm for clustering massive datasets. The advantage of this algorithm is to reduce the computational cost by not using all the dataset each iteration but a subsample of a fixed size. This strategy reduces the number of distance computations per iteration at the cost of lower cluster quality. The purpose of this paper is to perform empirical experiments using artificial datasets with controlled characteristics to assess how much cluster quality is lost when applying this algorithm. The goal is to obtain some guidelines about what are the best circumstances to apply this algorithm and what is the maximum gain in computational time without compromising the overall quality of the partition.Preprin

    Strategies and algorithms for clustering large datasets: a review

    Get PDF
    The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks in these kind of projects. More frequently these projects come from many different application areas like biology, text analysis, signal analysis, etc that involve larger and larger datasets in the number of examples and the number of attributes. Classical methods for clustering data like K-means or hierarchical clustering are beginning to reach its maximum capability to cope with this increase of dataset size. The limitation for these algorithms come either from the need of storing all the data in memory or because of their computational time complexity. These problems have opened an area for the search of algorithms able to reduce this data overload. Some solutions come from the side of data preprocessing by transforming the data to a lower dimensionality manifold that represents the structure of the data or by summarizing the dataset by obtaining a smaller subset of examples that represent an equivalent information. A different perspective is to modify the classical clustering algorithms or to derive other ones able to cluster larger datasets. This perspective relies on many different strategies. Techniques such as sampling, on-line processing, summarization, data distribution and efficient datastructures have being applied to the problem of scaling clustering algorithms. This paper presents a review of different strategies and clustering algorithms that apply these techniques. The aim is to cover the different range of methodologies applied for clustering data and how they can be scaled.Preprin

    Mining frequent spatio-temporal patterns from location based social networks

    Get PDF
    Report de recercaLocation Based Social Networks (LBSN) like Twitter or Instagram are a good source for user spatio-temporal behavior. These social network provide a low rate sampling of user's location information during large intervals of time that can be used to discover complex behaviors, including frequent routes, points of interest or unusual events. This information is important for different domains like route planning, touristic recommendation systems or city planning. Other approaches have used the data from LSBN to categorize areas of a city depending on the categories of the places that people visit or to discover user behavioral patterns from their visits. The aim of this paper is to analyze the frequent spatio-temporal patterns that users share when visiting a city. This behavior is studied in a well limited geographical area by means of frequent itemsets algorithms in order to establish some causal dependence between visits that can be interpreted as interesting routes or spatio-temporal connections. The data analyzed was obtained from the public data feeds of Twitter and Instagram inside the area of the cities of Barcelona and Milan for a period of several months. The analysis of these data shows that these kind of algorithms can be successfully applied to data from any city (or general area) to discover useful patterns that can be interpreted on terms of the city singular places and areas and that these patters can be used as a the elements of a knowledge base for different applications.Preprin

    Unsupervised feature selection by means of external validity indices

    Get PDF
    Feature selection for unsupervised data is a difficult task because a reference partition is not available to evaluate the relevance of the features. Recently, different proposals of methods for consensus clustering have used external validity indices to assess the agreement among partitions obtained by clustering algorithms with different parameter values. Theses indices are independent of the characteristics of the attributes describing the data, the way the partitions are represented or the shape of the clusters. This independence allows to use these measures to assess the similarity of partitions with different subsets of attributes. As for supervised feature selection, the goal of unsupervised feature selection is to maintain the same patterns of the original data with less information. The hypothesis of this paper is that the clustering of the dataset with all the attributes, even when its quality is not perfect, can be used as the basis of the heuristic exploration the space of subsets of features. The proposal is to use external validation indices as the specific measure used to assess well this information is preserved by a subset of the original attributes. Different external validation indices have been proposed in the literature. This paper will present experiments using the adjusted Rand, Jaccard and Folkes&Mallow indices. Artificially generated datasets will be used to test the methodology with different experimental conditions such as the number of clusters, cluster spatial separanton and the ratio of irrelevant features. The methodology will also be applied to real datasets chosen from the UCI machine learning datasets repository.Preprin

    Wind energy forecasting with neural networks: a literature review

    Get PDF
    Renewable energy is intermittent by nature and to integrate this energy into the Grid while assuring safety and stability the accurate forecasting of there newable energy generation is critical. Wind Energy prediction is based on the ability to forecast wind. There are many methods for wind forecasting based on the statistical properties of the wind time series and in the integration of meteorological information, these methods are being used commercially around the world. But one family of new methods for wind power fore castingis surging based on Machine Learning Deep Learning techniques. This paper analyses the characteristics of the Wind Speed time series data and performs a literature review of recently published works of wind power forecasting using Machine Learning approaches (neural and deep learning networks), which have been published in the last few years.Peer ReviewedPostprint (published version

    “Dust in the wind...”, deep learning application to wind energy time series forecasting

    Get PDF
    To balance electricity production and demand, it is required to use different prediction techniques extensively. Renewable energy, due to its intermittency, increases the complexity and uncertainty of forecasting, and the resulting accuracy impacts all the different players acting around the electricity systems around the world like generators, distributors, retailers, or consumers. Wind forecasting can be done under two major approaches, using meteorological numerical prediction models or based on pure time series input. Deep learning is appearing as a new method that can be used for wind energy prediction. This work develops several deep learning architectures and shows their performance when applied to wind time series. The models have been tested with the most extensive wind dataset available, the National Renewable Laboratory Wind Toolkit, a dataset with 126,692 wind points in North America. The architectures designed are based on different approaches, Multi-Layer Perceptron Networks (MLP), Convolutional Networks (CNN), and Recurrent Networks (RNN). These deep learning architectures have been tested to obtain predictions in a 12-h ahead horizon, and the accuracy is measured with the coefficient of determination, the R² method. The application of the models to wind sites evenly distributed in the North America geography allows us to infer several conclusions on the relationships between methods, terrain, and forecasting complexity. The results show differences between the models and confirm the superior capabilities on the use of deep learning techniques for wind speed forecasting from wind time series data.Peer ReviewedPostprint (published version

    Go with the flow: Recurrent networks for wind time series multi-step forecasting

    Get PDF
    One of the ways of reducing the effects of Climate Change is to rely on renewable energy sources. Their intermittent nature makes necessary to obtain a mid-long term accurate forecasting. Wind Energy prediction is based on the ability to forecast wind speed. This has been a problem approached using different methods based on the statistical properties of the wind time series. Wind Time series are non-linear and non-stationary, making their forecasting very challenging. Deep neural networks have shown their success recently for problems involving sequences with non-linear behavior. In this work, we perform experiments comparing the capability of different neural network architectures for multi-step forecasting obtaining a 12 hours ahead prediction using data from the National Renewable Energy Laboratory's WIND datasetPeer ReviewedPostprint (published version

    Predicting wind energy generation with recurrent neural networks

    Get PDF
    Decarbonizing the energy supply requires extensive use of renewable generation. Their intermittent nature requires to obtain accurate forecasts of future generation, at short, mid and long term. Wind Energy generation prediction is based on the ability to forecast wind intensity. This problem has been approached using two families of methods one based on weather forecasting input (Numerical Weather Model Prediction) and the other based on past observations (time series forecasting). This work deals with the application of Deep Learning to wind time series. Wind Time series are non-linear and non-stationary, making their forecasting very challenging. Deep neural networks have shown their success recently for problems involving sequences with non-linear behavior. In this work, we perform experiments comparing the capability of different neural network architectures for multi-step forecasting in a 12 h ahead prediction. For the Time Series input we used the US National Renewable Energy Laboratory’s WIND Dataset [3], (the largest available wind and energy dataset with over 120,000 physical wind sites), this dataset is evenly spread across all the North America geography which has allowed us to obtain conclusions on the relationship between physical site complexity and forecast accuracy. In the preliminary results of this work it can be seen a relationship between the error (measured as R2R2 ) and the complexity of the terrain, and a better accuracy score by some Recurrent Neural Network Architectures.Peer ReviewedPostprint (author's final draft

    Automatic classification of gait patterns using a smart rollator and the BOSS model

    Get PDF
    Nowadays, the risk of falling in older adults is a major concern due to the severe consequences it brings to socio-economic and public health systems. Some pathologies cause mobility problems in the aged population, leading them to fall and, thus, reduce their autonomy. Other implications of ageing involve having different gait patterns and walking speed. In this paper, a non-invasive framework is proposed to study gait in elder people using data collected by a smart rollator, the i-Walker. The analysis presented in this article uses a feature extraction method and a spectral embedding to represent the information and Bayesian clustering for the knowledge discovery. The algorithm considers raw data from the i-Walker sensors along with the calculated walking speed of each individual, which has been already used in clinical studies to assess physical and cognitive status of older adults. The results obtained demonstrate that the proposed analysis has the potential to separate in clusters the people of the two groups of interest: young people and geriatric.Peer ReviewedPostprint (author's final draft
    • …
    corecore